This week we are going to continue working with our field data from Siberia. Specifically, we are going to be thinking about different types of data, statistical distributions, and probabilities. We’ll need to be familiar with these concepts in order to perform stastical hypothesis tests in the coming weeks.
To begin, let’s create a new script, fill in the appropriate header information, and set our working directory.
#####################
# probability script
# GEOG250 F19
# MML 10/22/19
#####################
# set working directory to my folder on the server
setwd("Z:/Geog250_F19/loranty/")
# remember that you should be writing good descriptive comments here - these are your notes!
As we begin to think more deeply about our data it is worth noting the different between discrete and continuous variables. What is the difference between these two types of data? Can you think of representative examples from the data we used last class?
There happens to be a discrete variable in our thaw depth data set. Let’s read it in and have a look. Which variable in this data set is discrete?
thaw <- read.csv("data/thaw_depth.csv")
You may realize that our disctrete variable only occurs at one field site represented in this data set (wws), so before we can go further we need to create a subset. Let’s do that now using matrix notation and the site variable in a logical statement. Define your data subset as an object called w.thaw
For the sake of simplicity let’s assume that our variable, tussock, which indicates the presence or absence of a tussock at our sampling location, has a probability of 0.5 for either outcome. In this case we are interested in knowing where there are tussocks co-located with our thaw depth measurments along the sampling transects. You can see in the photo below that tussocks create an accumulation of soil that sits above the surrounding vegetation, and might affect our measurments.
Here is a picture of some tussock tundra in Alaska (photo S. Hewitt)